Lab4: Classification
Linear Discriminant Analysis
Edgar Anderson’s Iris Data
Species: setosa / versicolor / virginicaSepal.LengthSepal.WidthPetal.LengthPetal.Width[1] 150 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
setosa versicolor virginica
0.33333 0.33333 0.33333
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
If prior is unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.
Components of an lda object
prior: the prior probabilities used.means: the group means.scaling : a matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.svd: the singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics.Call:
lda(Species ~ ., data = training, prior = c(1, 1, 1)/3)
Prior probabilities of groups:
setosa versicolor virginica
0.33333 0.33333 0.33333
Group means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa 5.0075 3.4075 1.4475 0.2325
versicolor 5.9500 2.8100 4.2775 1.3425
virginica 6.6325 2.9900 5.6025 2.0400
Coefficients of linear discriminants:
LD1 LD2
Sepal.Length 0.77233 -0.26786
Sepal.Width 1.43010 -2.12129
Petal.Length -2.01887 0.92191
Petal.Width -3.08463 -2.50828
Proportion of trace:
LD1 LD2
0.993 0.007
lda model$class
[1] virginica versicolor versicolor versicolor versicolor versicolor setosa versicolor
[9] virginica setosa virginica versicolor setosa setosa virginica setosa
[17] setosa versicolor setosa setosa
[ reached getOption("max.print") -- omitted 100 entries ]
Levels: setosa versicolor virginica
$posterior
setosa versicolor virginica
142 5.1938e-36 3.3938e-04 9.9966e-01
51 7.7151e-18 9.9990e-01 1.0203e-04
58 2.3734e-13 1.0000e+00 6.3224e-08
93 2.2139e-17 9.9999e-01 7.7603e-06
75 3.4610e-17 9.9998e-01 1.9679e-05
96 8.7875e-17 9.9999e-01 1.0777e-05
[ reached getOption("max.print") -- omitted 114 rows ]
$x
LD1 LD2
142 -5.20625 -1.868888
51 -1.40229 -0.219119
58 -0.10799 1.753058
93 -1.15701 1.231407
75 -1.17871 0.460048
96 -1.06598 0.594060
2 7.05363 0.735270
86 -2.10175 -1.061552
146 -5.70561 -1.510995
38 8.22015 -0.286676
[ reached getOption("max.print") -- omitted 110 rows ]
ldahist(lda.fit$x[, 1], g = training$Species, type = "both"); title("LD1", cex.main = 2)
ldahist(lda.fit$x[, 2], g = training$Species, type = "both"); title("LD2", cex.main = 2)LDs) LD1 LD2
Sepal.Length 0.77233 -0.26786
Sepal.Width 1.43010 -2.12129
Petal.Length -2.01887 0.92191
Petal.Width -3.08463 -2.50828
X <- model.matrix(Species ~ ., training)[, -1]
center <- t(lda.mod$prior) %*% lda.mod$means
Xc <- scale(X, center = center, scale = FALSE)
LD <- Xc %*% Z
LD LD1 LD2
142 -5.20625 -1.868888
51 -1.40229 -0.219119
58 -0.10799 1.753058
93 -1.15701 1.231407
75 -1.17871 0.460048
96 -1.06598 0.594060
2 7.05363 0.735270
86 -2.10175 -1.061552
146 -5.70561 -1.510995
38 8.22015 -0.286676
[ reached getOption("max.print") -- omitted 110 rows ]
[1] TRUE
Call:
qda(Species ~ ., data = training, prior = c(1, 1, 1)/3)
Prior probabilities of groups:
setosa versicolor virginica
0.33333 0.33333 0.33333
Group means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa 5.0075 3.4075 1.4475 0.2325
versicolor 5.9500 2.8100 4.2775 1.3425
virginica 6.6325 2.9900 5.6025 2.0400
library(ggplot2)
train.2d <- data.frame(lda.fit$x, Species = training$Species)
lda.2d.mod <- lda(Species ~ LD1 + LD2, train.2d)
qda.2d.mod <- qda(Species ~ LD1 + LD2, train.2d)
grid.2d <- expand.grid(lapply(train.2d[1:2], \(x) seq(min(x), max(x), length.out = 1000)))
lda.class <- as.integer(predict(lda.2d.mod, newdata = grid.2d)$class)
qda.class <- as.integer(predict(qda.2d.mod, newdata = grid.2d)$class)
grid.2d.pred <- cbind(grid.2d, lda.class, qda.class)
ggplot(train.2d, aes(x = LD1, y = LD2)) +
geom_point(aes(fill = Species, shape = Species)) +
geom_contour(aes(z = lda.class, colour = "LDA"), grid.2d.pred, breaks = c(1.5, 2.5)) +
geom_contour(aes(z = qda.class, colour = "QDA"), grid.2d.pred, breaks = c(1.5, 2.5)) +
scale_shape_manual(values = 22:24) +
scale_colour_discrete(name = "Classifiers") +
theme_bw() Prediction
Reality setosa versicolor virginica
setosa 10 0 0
versicolor 0 10 0
virginica 0 1 9
[1] 0.96667
Prediction
Reality setosa versicolor virginica
setosa 10 0 0
versicolor 0 9 1
virginica 0 1 9
[1] 0.93333